-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
builtins: add generator builtin for span payloads #60784
builtins: add generator builtin for span payloads #60784
Conversation
01fc0ba
to
953ebe7
Compare
pkg/sql/logictest/testdata/logic_test/contention_event, line 68 at r2 (raw file):
This PR is not ready yet because I am trying to figure out why the builtin (named I seem to be adhering to the way to return a row in the generator overload correctly, and in the After I can get the builtin to return the columns, I will modify this logic test to look more like the one above (filtering on the |
e283b62
to
e997837
Compare
It does return two columns. Try this: |
@knz when I do so I only get one column, which is called
If I run it with |
I wrote |
A ha I missed that! I see. So it is an issue with the way my query is written and not in the generator. |
It's not really an "issue" - that's the regular, expected semantics of set-returning functions.
|
Understood, I meant it was an 'issue' with the way I wrote the logic test and expected a table after using the generator as a scalar. I'm having trouble with the syntax then — how can I generate a table comprised of sub-tables that each represent the return value of the SRF as a table for a span ID? Basically I want to write something like
but I cannot pass in a table as the arguments to the SRF. |
You need to use the LATERAL keyword as explained here https://www.postgresql.org/docs/13/queries-table-expressions.html |
e997837
to
f47f53f
Compare
pkg/sql/logictest/testdata/logic_test/builtin_function, line 2921 at r3 (raw file):
I wanted to write a logic test for the schema of the built-in but I'm not sure I can simply pass in 0 as the span — it is possible that a span's ID is 0 right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @knz TIL about LATERAL!
The PR is now ready for review, and I added a couple of questions/comments of mine as well related to testing.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)
pkg/sql/logictest/testdata/logic_test/contention_event, line 62 at r3 (raw file):
) SELECT count(*) > 0 FROM payloads WHERE payload_type = '"type.googleapis.com/cockroach.roachpb.ContentionEvent"';
I fetch the value of the type by using FetchValKey()
in JSON which replicates the ->
operator in the JSONB builtin. There is no replication for ->>
which prints the value in string format (without the double quotes). Is this an alright user experience? I'm not sure a user would know to filter on a double-quoted in the string unless they SELECT *
from the table first.
I can strip the double quotes in a hacky way, unless there is a better way.
f47f53f
to
d22ede5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r1, 3 of 5 files at r3, 1 of 4 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @irfansharif, @knz, and @tbg)
pkg/sql/logictest/testdata/logic_test/builtin_function, line 2921 at r3 (raw file):
Previously, angelapwen (Angela P Wen) wrote…
I wanted to write a logic test for the schema of the built-in but I'm not sure I can simply pass in 0 as the span — it is possible that a span's ID is 0 right?
Very unlikely, but yes. We could instead try using a negative number, which is definitely not possible.
pkg/sql/logictest/testdata/logic_test/contention_event, line 62 at r3 (raw file):
) SELECT count(*) > 0 FROM payloads WHERE payload_type = '"type.googleapis.com/cockroach.roachpb.ContentionEvent"';
I wonder if we should strip out everything but roachpb.ContentionEvent. Seems like gunk that would be visible everywhere.
pkg/sql/sem/builtins/generator_builtins.go, line 344 at r3 (raw file):
payloadsForSpanGeneratorType, makePayloadsForSpanGenerator, "Returns the payload(s) of the span whose ID is passed in the argument.",
Returns the payload(s) for the requested span.
pkg/sql/sem/builtins/generator_builtins.go, line 1396 at r3 (raw file):
// over all recordings for a given Span. type payloadsForSpanGenerator struct { // The span to iterate over, set by the constructor.
No need to mention that it's set by the constructor.
pkg/sql/sem/builtins/generator_builtins.go, line 1399 at r3 (raw file):
span *tracing.Span // recordingIndex maintains the current position of the index of the iterator
I understand that we're not iterating through all child recordings at the very beginning to try and be a bit more "stream-ey" about it, but does that actually help us with anything? We already have a handle on the tracing.Span, which already has all the payloads accessible to it in-memory. So it's already been allocated somewhere. What's the downside of simply slurping up everything the very first time and flattening it out in payloads
?
(Asking more for my own understanding. The PR as written also obviously works, but I didn't realize there was a need to do anything fancy here.)
pkg/sql/sem/builtins/generator_builtins.go, line 1426 at r3 (raw file):
spanID := uint64(*(args[0].(*tree.DInt))) span, found := ctx.Settings.Tracer.GetActiveSpanFromID(spanID) // A span may not be found if its ID was surfaced previously but its
Or it may just be a made up ID, we can skip saying anything here.
d22ede5
to
e727493
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)
pkg/sql/logictest/testdata/logic_test/builtin_function, line 2921 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Very unlikely, but yes. We could instead try using a negative number, which is definitely not possible.
I'm worried about using a negative number because of this issue: #60668 — we are coercing a Tree.DInt
(which is an int64) into a uint64, which is what the span ID is saved as internally, so I believe a negative number is also likely to be cast into some possible span ID.
pkg/sql/logictest/testdata/logic_test/contention_event, line 62 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
I wonder if we should strip out everything but roachpb.ContentionEvent. Seems like gunk that would be visible everywhere.
I could write a function to strip it out but worried it would be a bit hacky. The other possible event type right now is "type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats"
so I guess that could just be sql/distqlrun.ComponentStats
if I stripped out "type.googleapis.com/cockroach"
?
pkg/sql/sem/builtins/generator_builtins.go, line 344 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Returns the payload(s) for the requested span.
Done.
pkg/sql/sem/builtins/generator_builtins.go, line 1396 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
No need to mention that it's set by the constructor.
Done.
pkg/sql/sem/builtins/generator_builtins.go, line 1399 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
I understand that we're not iterating through all child recordings at the very beginning to try and be a bit more "stream-ey" about it, but does that actually help us with anything? We already have a handle on the tracing.Span, which already has all the payloads accessible to it in-memory. So it's already been allocated somewhere. What's the downside of simply slurping up everything the very first time and flattening it out in
payloads
?(Asking more for my own understanding. The PR as written also obviously works, but I didn't realize there was a need to do anything fancy here.)
The memory issue was something @knz had brought up in the non-generator builtin PR so I will defer to him to answer this. I hadn't realized that all the payloads were already accessible in-memory and thought they were accessed on the fly when the Structured()
visitor is called.
pkg/sql/sem/builtins/generator_builtins.go, line 1426 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Or it may just be a made up ID, we can skip saying anything here.
Done.
e727493
to
ba06d6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as a non-expert in sql code, I'll defer to you+Rafa.
pkg/sql/logictest/testdata/logic_test/builtin_function, line 2921 at r3 (raw file): Previously, tbg (Tobias Grieger) wrote…
Thank you! That makes sense. Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
although, as discussed elsewhere - I do not see the payload_pretty
column as useful nor necessary, and in fact it nearly doubles memory usage of the results.
Reviewed 1 of 5 files at r1, 4 of 6 files at r5, 1 of 1 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)
pkg/sql/sem/builtins/generator_builtins.go, line 1399 at r3 (raw file):
We already have a handle on the tracing.Span, which already has all the payloads accessible to it in-memory. So it's already been allocated somewhere
It's allocated somewhere but it's not accounted and not restricted by memory budgets. That is in fact a problem and a bug that needs to be fixed.
What's the downside of simply slurping up everything the very first time and flattening it out in payloads?
-
The JSON blobs are alloc-heavy and amplify the allocations from the span by a factor 2x-3x. This is not needed.
-
The streaming approach ensures that all the individual bits go through the explicit memory budgeting limits.
FROM payload_types | ||
WHERE payload_type = 'type.googleapis.com/cockroach.roachpb.ContentionEvent'; | ||
FROM payloads | ||
WHERE payload_type = 'roachpb.ContentionEvent' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, how does it actually render the contention event?
Previously we introduced a `crdb_internal.payloads_for_span` builtin that returned a JSONB array representing all payloads for a particular span. To improve the user experience of the builtin as well as prevent OOMs resulting from builtin usage, we replace it with a generator builtin that returns a table representing all payloads for a span, where each row represents a single payload. The new builtin, also called `crdb_internal.payloads_for_span`, has columns for the `payload_type` so that the user has the ability to easily filter on the type, and `payload_jsonb` so the user can use jsonb builtins to filter further. Release note (sql change): Update `crdb_internal.payloads_for_span` builtin to return a table instead of a JSONB array. Each row of the table represents one payload for the given span. It has columns for `payload_type` and `payload_jsonb`.
737a05c
to
8210796
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raphael convinced me that we don't need the pretty column as there is already a builtin called jsonb_pretty
that the user could use on the payload_jsonb
value if they wanted to view it that way ✨
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)
pkg/sql/logictest/testdata/logic_test/contention_event, line 62 at r6 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I'm curious, how does it actually render the contention event?
JSONB output:
{"@type": "type.googleapis.com/cockroach.roachpb.ContentionEvent", "duration": "0.006409s", "key": "vYkSawABiA==", "txnMeta": {"epoch": 0, "id": "aed8bb08-cbb2-4c16-9edc-f46864079f0d", "key": "vYkSawABiA==", "minTimestamp": {"logical": 0, "synthetic": false, "wallTime": "1613988895236491000"}, "priority": 785132, "sequence": 1, "writeTimestamp": {"logical": 0, "synthetic": false, "wallTime": "1613988895236491000"}}}
BTW I was just trying to use this in #60754 (comment) and it's hard to use because it's |
Huh, I tried a manual attempt (the insert took ~4s because I was holding an intent open in another txn) and I'm not seeing any contention meta:
|
(btw this is on master, not this PR) |
Yes, that's the plan! Raphael mentioned that we could add this within the SQL layer as a delegate for syntactic sugar on the |
This seems to be a problem unrelated to this PR. I would advise filing a separate issue. |
I'm currently trying to repro in a unit test. Will file an issue as appropriate, definitely not something for this PR. |
I'm also not sure why your 2nd column populates as the scalar version of the builtin 🤔 When I run the same query I get:
|
bors r=irfansharif,knz |
@tbg was background tracing on when you tried it? I'm surprised you also didn't see the
|
Well, I guess it would have to be to get the span IDs.. |
Heh, I trip over this every time! I bet it's this one: #59315 |
Sigh of relief 😌 |
Build failed: |
Unrelated test flakes. bors r=irfansharif,knz |
It's nice that it works, but the final query is a monster! I'm glad there are [plans] to improve the UX. [plans]: cockroachdb#60784 (comment) Release note: None
Build succeeded: |
Currently this builtin, We may want to consider changing the builtin name to |
Following up from #60616. Part of addressing #55733.
Previously we introduced a
crdb_internal.payloads_for_span
builtin that returned a JSONB array representing all payloads for
a particular span. To improve the user experience of the builtin
as well as prevent OOMs resulting from builtin usage, we replace it
with a generator builtin that returns a table representing all payloads
for a span, where each row represents a single payload.
The new builtin, also called
crdb_internal.payloads_for_span
, hascolumns for the
payload_type
so that the user has the ability toeasily filter on the type, and
payload_jsonb
so the user can usejsonb builtins to filter further.
Release note (sql change): Update
crdb_internal.payloads_for_span
builtin to return a table instead of a JSONB array. Each row of the
table represents one payload for the given span. It has columns for
payload_type
andpayload_jsonb
.